Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

نویسندگان

چکیده

Multi-exit architectures, in which a sequence of intermediate classifiers are introduced at different depths the feature layers, perform adaptive computation by early exiting ``easy" samples to speed up inference. In this paper, novel Harmonized Dense Knowledge Distillation (HDKD) training method for multi-exit architecture is designed encourage each exit flexibly learn from all its later exits. particular, general dense knowledge distillation objective proposed incorporate possible beneficial supervision information learning, where harmonized weighting scheme multi-objective optimization problem consisting classification loss and loss. A bilevel algorithm alternatively updating weights multiple objectives network parameters. Specifically, parameters optimized with respect performance on validation set gradient descent. Experiments CIFAR100 ImageNet show that HDKD strategy harmoniously improves state-of-the-art neural networks. Moreover, does not require within modifications can be effectively combined other previously-proposed techniques further boosts performance.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

We compare two approaches to compute a fraction of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism o...

متن کامل

Deep Learning in Multi-Layer Architectures of Dense Nuclei

In dense clusters of neurons in nuclei, cells may interconnect via soma-to-soma interactions, in addition to conventional synaptic connections. We illustrate this idea with a multi-layer architecture (MLA) composed of multiple clusters of recurrent sub-networks of spiking Random Neural Networks (RNN) with dense soma-to-soma interactions. We use this RNN-MLA architecture for deep learning. The i...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

HamleDT: Harmonized multi-language dependency treebank

We present HamleDT – a HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. In the present article, we provide a thorough investigation and discussion of a number of phenomena that are comparable across languages, though their ann...

متن کامل

Knowledge Distillation for Bilingual Dictionary Induction

Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17225